COE-227: Canonical Data Store and Collection Pipeline by kumanday · Pull Request #5 · trilogy-group/StackPerf

kumanday · 2026-03-21T03:17:59Z

Implements the canonical benchmark database, collectors, normalization, and rollups for COE-227.

Changes

Database schema with 9 canonical tables (providers, harness_profiles, variants, experiments, task_cards, sessions, requests, metric_rollups, artifacts)
Repository layer with 9 repositories for all entities
Session service for lifecycle management
LiteLLM collector and normalizer for request ingestion
Prometheus collector and rollup computations
Unit and integration test suites (57 tests total)

Test Results

All 57 tests pass (40 unit + 17 integration)

Acceptance Criteria

All 15 acceptance criteria met as documented in the Linear issue.

Implements COE-227: - Database schema with 9 canonical tables (providers, harness_profiles, variants, experiments, task_cards, sessions, requests, metric_rollups, artifacts) - Repository layer with 9 repositories for all entities - Session service for lifecycle management - LiteLLM collector and normalizer for request ingestion - Prometheus collector and rollup computations - Unit and integration test suites Key design decisions: - SQLAlchemy async ORM with PostgreSQL UUID types - Foreign key constraints for referential integrity - Unique constraints on session_id, proxy_key_alias, litellm_call_id - Percentile calculations with empty list handling for NumPy 2.x Tests: 40 unit tests + 17 integration tests all pass

coderabbitai · 2026-03-21T03:18:04Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 220915a2-2499-4a45-8a13-1431a77af23c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch leonardogonzalez/coe-227-canonical-data-store-and-collection-pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82763a0ab8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-21T03:23:45Z

src/collectors/rollups.py

+        session_rollups = await self.session.execute(
+            select(MetricRollupModel).where(
+                MetricRollupModel.scope_type == RollupScopeType.SESSION,
+                MetricRollupModel.scope_id.in_([s.session_id for s in sessions]),
+            )


Aggregate variant latency from requests, not session medians

Variant-level median_latency_ms/p95_latency_ms are built from one precomputed median per session, so sessions with very different request counts are weighted equally. In a common case like 100 fast requests in one session and 1 slow request in another, this reports a 50/50 blend instead of the true request-level distribution, which will skew the core variant comparisons this service is meant to produce.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-21T03:23:45Z

src/collectors/litellm_collector.py

+            session_id=UUID(session_id) if session_id else None,
+            experiment_id=UUID(correlation_keys["experiment_id"]) if correlation_keys.get("experiment_id") else None,
+            variant_id=UUID(correlation_keys["variant_id"]) if correlation_keys.get("variant_id") else None,
+            provider_id=UUID(correlation_keys["provider_id"]) if correlation_keys.get("provider_id") else None,


Populate experiment and variant IDs after alias-based session lookup

When a LiteLLM row is correlated only through proxy_key_alias, _resolve_session() can recover the session, but this constructor still leaves experiment_id and variant_id null because it only trusts raw tags. compute_experiment_rollups() later filters on RequestModel.experiment_id, so alias-only traffic disappears from experiment totals and latency aggregates even though the matched session already tells us which experiment/variant it belongs to.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-21T03:23:46Z

src/collectors/litellm_collector.py

+        if await self.request_repo.exists_by_litellm_call_id(
+            self.session, litellm_call_id
+        ):


Make duplicate request ingestion atomic

This existence check does not make ingestion idempotent under concurrency. If two collector workers or retries ingest the same litellm_call_id at the same time, both can observe False here and then race into RequestRepository.create(), where the loser hits the unique constraint on commit and fails the collection run instead of treating the duplicate as a no-op.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-21T03:23:46Z

pyproject.toml

+]
+
+[project.scripts]
+bench = "cli.main:cli"


Point the bench script at a real CLI entrypoint

Installing this package will register bench, but the entrypoint targets cli.main:cli and this commit only adds src/cli/__init__.py; there is no src/cli/main.py or exported cli object anywhere in the repo. Anyone following the documented pip install -e . flow will get an import error before the CLI can start.

Useful? React with 👍 / 👎.

…vider change

…permissions

…relation keys (#14) * COE-306: Build LiteLLM collection job for raw request records and correlation keys - Implement LiteLLMCollector with idempotent ingest and watermark tracking - Add CollectionDiagnostics for missing field reporting - Add CollectionJobService in benchmark_core/services.py - Preserve session correlation keys in metadata - Add comprehensive unit tests (29 tests, all passing) Co-authored-by: openhands <[email protected]> * Update workpad: mark all tasks complete, add validation evidence * Update workpad: document GitHub PR blocker * COE-306: Update workpad - PR creation blocked, ready for human action * COE-306: Update workpad - document active GitHub PR blocker * COE-306: Final workpad update - sync HEAD commit hash * COE-306: Update workpad for retry #2 - document PR creation blocker * COE-306: Final workpad - document complete blockers status * COE-306: Final workpad - correct HEAD commit hash * COE-306: Retry #3 - Update workpad with PR creation blocker status * COE-306: Retry #4 - Update workpad with retry status * COE-306: Final retry #4 workpad - confirmed PAT permission blocker, all fallbacks exhausted * COE-306: Add PR description for manual creation * COE-306: Final workpad - ready for manual PR creation * COE-306: Retry #5 - Document PR creation blocker status after LLM provider change * COE-306: Retry #6 - Updated workpad with retry #6 blocker status * COE-306: Retry #7 - Update workpad with retry #7 confirmation * COE-306: Final workpad - confirmed PAT blocker, ready for manual PR * COE-306: Session #8 - PR #14 created successfully, workpad updated * COE-306: Update environment stamp to c083393 * COE-306: Address PR feedback - fix watermark logic, rename field, add evidence - Fix watermark/start_time interaction: use max() instead of unconditional override - Rename requests_new to requests_normalized for clarity - Remove WORKPAD.md from repo (add to .gitignore) - Add runtime evidence via scripts/demo_collector.py - Add test for watermark/start_time interaction - Update PR_DESCRIPTION.md with Evidence section --------- Co-authored-by: openhands <[email protected]>

* COE-309: Implement session manager service and CLI commands - Add SessionService with create_session(), get_session(), finalize_session() - Add CredentialService for proxy credential management - Implement session CLI commands: create, list, show, finalize, env - Add git metadata capture (branch, commit, dirty state) to sessions - Implement SQLAlchemySessionRepository for session persistence - Implement SQLAlchemyRequestRepository for request persistence - Add comprehensive tests for all components Co-authored-by: openhands <[email protected]> * COE-309: Restructure services into package format - Move services.py to services/ package with separate modules - Create session_service.py for SessionService - Create credential_service.py for CredentialService - Update CLI imports to use new structure Co-authored-by: openhands <[email protected]> * COE-309: Fix linting and formatting issues - Apply ruff formatting and import sorting - Fix exception handling with 'from e' for B904 compliance - Fix variable naming (session_local vs SessionLocal) - Remove unused imports Co-authored-by: openhands <[email protected]> * COE-309: Address PR feedback - remove dead code and handle detached HEAD - Remove unused _get_db_session() function from session.py - Add (detached) marker for detached HEAD state in git.py - All 108 tests passing * COE-309: Address PR feedback - move asyncio import to top of file * COE-309: Fix late imports - move all imports to top of session.py * COE-309: Update workpad with PR feedback response status * COE-309: Final workpad update with commit SHA and merge status * COE-309: Add CLI evidence document (EVIDENCE_COE-309.md) Addresses PR feedback about missing CLI command execution evidence. Document demonstrates all 5 session commands with expected outputs, git metadata capture, and test evidence. * COE-309: Update workpad for Retry #3 - CLI evidence committed * COE-309: Update workpad for Retry #4 - all feedback addressed - Updated status to reflect Retry #4 completion - Documented PR review status (COMMENTED, not CHANGES_REQUESTED) - Verified all inline comments resolved - Confirmed tests passing (108/108) * COE-309: Final workpad update - PR approved, awaiting merge * COE-309: Fix type errors in git.py and lint issues - quality checks passing * COE-309: Update workpad for Retry #5 - PR ready to merge, blocked by permissions * COE-309: Fix exports and remove unused import after merge Summary: - Update services/__init__.py to export CollectionJobService and CollectionJobResult - Remove unused Request import from session_service.py Rationale: - Merge from origin/main added CollectionJobService to session_service.py - Tests require CollectionJobService to be accessible from benchmark_core.services - F401 lint error flagged unused Request import Tests: - 136 unit tests passing with PYTHONPATH set correctly Co-authored-by: Codex <[email protected]> --------- Co-authored-by: openhands <[email protected]> Co-authored-by: Codex <[email protected]>

kumanday added the symphony Symphony orchestrated task label Mar 21, 2026

chatgpt-codex-connector bot reviewed Mar 21, 2026

View reviewed changes

kumanday force-pushed the main branch from b5b1de4 to 209d1ef Compare March 25, 2026 22:55

kumanday added a commit that referenced this pull request Mar 26, 2026

COE-306: Retry #5 - Document PR creation blocker status after LLM pro…

f557306

…vider change

kumanday pushed a commit that referenced this pull request Mar 27, 2026

COE-309: Update workpad for Retry #5 - PR ready to merge, blocked by …

0724b3c

…permissions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COE-227: Canonical Data Store and Collection Pipeline#5

COE-227: Canonical Data Store and Collection Pipeline#5
kumanday wants to merge 1 commit intomainfrom
leonardogonzalez/coe-227-canonical-data-store-and-collection-pipeline

kumanday commented Mar 21, 2026

Uh oh!

coderabbitai bot commented Mar 21, 2026

Review skipped

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 21, 2026

Uh oh!

chatgpt-codex-connector bot Mar 21, 2026

Uh oh!

chatgpt-codex-connector bot Mar 21, 2026

Uh oh!

chatgpt-codex-connector bot Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kumanday commented Mar 21, 2026

Changes

Test Results

Acceptance Criteria

Uh oh!

coderabbitai bot commented Mar 21, 2026

Review skipped

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant